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1. INTRODUCTION 

According to the National Highway Transportation Safety Administration (NHTSA) studies in 2016, 
2018, and 2019 about 96% of all car accidents are caused by human mistakes [1]-[3]. These studies indicate 
many factors contributing to vehicle accidents, such as drowsiness and fatigue caused by lack of sleep, 
driving drunk, and looking down at the phone to read or send a text message. Vehicle accidents can be 
avoided with advanced driver assistance systems (ADAS) [4]. The ADAS can support the driver by the 
vehicle surrounding environment information to prevent accidents. The major ADAS safety applications 
include automatically emergency braking, recognition of traffic signs, lane departure alert, pedestrian 
avoidance, and blind spot detection [5]. These applications represent the key to lifesaving by utilizing the 
latest sensor systems and running vision-based algorithms. Our goal is to design a reliable driver safety 
system with the lowest cost to help in the modern automobile design process. Our work in this paper focuses 
on the vehicle interior, especially on a driver’s facial trait. Yawning and facial expressions are among the 
most prominent traits that help in driver fatigue detection [6]-[8]. It is commonly known that yawning 
represents another sign of drowsiness; head drooping indicates fatigue; moreover, anger, fear, surprise, and 
sadness negatively affect the driver [9], [10]. The system will be more robust if it integrates all these 
decision-making factors. The deep learning approach save been used to observe a ddriver’sbehavior based on 
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face monitoring. Driving behavior is several actions of a driver while going. It can be classified into normal 
and abnormal driving, where everyday driving is defined as the typical daily behavior. In contrast, the weird, 
such as drowsy and drunk driving, is uncommon and results from some physical or mental factors [11]. Deep 
learning is a form of machine learning that makes the computer learn through experiments to collect 
knowledge and understand the world [12], [13]. One of the deep learning classes is a convolutional neural 
network (CNN), commonly utilized for image analysis [14]-[16]. It mimics the neurons connectivity method 
in human brains [17], [18]. Figure 1 clarifies the CNN structure; it consists of three essential parts: input 
images, feature extraction, and classification. The CNN needs less preprocessing for input images than other 
classification algorithms for the feature extraction stage; it’s made up of multiple layers, containing 
convolutional layers, rectified linear unit (ReLU) activation function pooling layers, and normalization 
layers. There are fully connected layers for the classification stage and one classification layer [19], [20]. 
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Figure 1. The CNN architecture [13] 


One of the main reasons for the profound learning superiority of traditional image processing 
methods is the way of representing the input data. Moreover, the performance increases with good data 
representation [21]-[23]. Thus, many researchers focused on feature extraction from raw data based on 
machine learning methods [24]. Traditional methods such as scale-invariant feature transform (SIFT), 
histogram of oriented gradients (HOG), and speeded up robust features (SURF) need a lot of time and effort 
in feature extraction. At the same time, the deep learning algorithms can perform it automatically [25], [26]. 
This work aims to detect the facial landmark through a particular group of points, such as the corners of (the 
eye, mouth, or nose). Monitoring the information of face can help understand the driver’s situation and 
prevent fatal accidents. This paper's main contribution is to use the EfficientNet CNN model for facial 
landmarks prediction, which is mapped to recognize the shape of the human face. 


2. DROWSINESS DETECTION METHODS 

There are many drowsiness and fatigue detection methods based on driver behavior; these methods 
depend on several parameters to figure out the driver’s fatigue [27]. Yawning, eye blinking, and facial 
features are among these parameters. Various conventional methods to detect the driver’s drowsiness have 
been summarized, followed by the newest deep learning methods. 


2.1. The conventional methods: 

— Eye blink monitoring method: Rahman et al. [28], an eye blinking strategy has been proposed to detect 
drowsiness. Harris corner algorithm is used to detect eyes corners on both sides. The eye state of 
open/closed is determined based on calculating the distance from the center to the bottom of an eye 
within a known time interval. 

— Yawning detection method: Abtahi et al. [29], the driver’s tiredness identification strategy is based on 
estimating his physiological, social, and execution state. To recognize tiredness, they focused on yawn 
identification with three strategies: recognizing face and mouth, recognizing face dependent on layout 
coordination, then mouth using shading condition, and finally utilizing Viola-Jones hypothesis for 
recognizing face and mouth to identify yawn. 
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— Wearable electroencephalographic (EEG) system method: Li et al. [30], the driver’s sleepiness has been 
recognized based on EEG signal. In this work, 20 subjects (15 for training/and 5 for the test) are utilized 
for mind exercises to identify the driver’s laziness. They proposed three levels of drowsiness, starting 
with an alert, then early warning, and drowsy. Support vector machine is used to examine the level of 
tiredness, and the system gives the precision of drowsy detection (91.92%), early-warning (83.78%), 
and (91.25%) for alert. 

— A hybrid strategy: Oliveira et al. [31], a real-road experiment has been utilized to compare the 
performance of electrooculogram (EOG) and electrocardiogram (ECG) methods to detect driver 
drowsiness. The hybrid strategy by combining both ways increased the system robustness rather than 
using the EOG method Individually. 

— Multiple sensors method: Anand et al. [32], an Arduino-based system with various sensors has been 
proposed to detect driver drowsiness, including an eye blink sensor, an alcohol sensor, and a heartbeat 
sensor. The system monitors the sensor’s status; for any abnormal state, the vehicle automatically slows 
down and stops. 

— Facial features monitoring method: Manu [33], face detection and skin segmentation have been 
proposed to detect driver weariness. The edge detection algorithm is used for eye tracking. The K- 
means algorithm is utilized for yawning detection, and system accuracy was 94.58%. 

— [In addition to these researches, many studies and tests are conducted to detect driver fatigue and 
drowsiness. Despite some studies that obtained good results, these methods have computational and 
applied complexities; moreover, they cannot satisfy the real-time requirements. 


2.2. The deep learning methods 

The superiority of deep learning over traditional methods of solving complex problems has led to its 
being widely used. Deep learning is extensively utilized for computer vision purposes like object detection, 
emotion recognition, and image classification. There are several approaches to detecting drowsiness using 
deep understanding. For instance, Vijayan and Sherly [34], a combination of three deep learning models to 
extract facial features has been proposed to compose a feature-fused architecture (FFA). Experimentally 
ResNet50, VGG16, and InceptionV3 model of CNN has been used; regardless of the three networks and 
FFA, the model of InceptionV3 has shown an accuracy rate of 78%. Park et al. [35], researchers followed the 
same approach for drowsiness detection by combining the obtained results of AlexNet, VGG-FaceNet, and 
FlowNet with fully connected layers. The proposed architecture is called deep drowsiness detection (DDD), 
which achieved 73.06% of drowsiness detection accuracy. In one more similar approach, in [36], the 
researcher used a hybrid of CNN and long short-term memory (LSTM) for drowsiness detection. 


3. METHOD 

Figure 2 shows the proposed system, which consists of three steps—starting with the detection of 
the driver’s face, while the second is the facial landmarks prediction based on the EfficientNet model. The 
final step is drowsiness detection by calculating the number of eyes blinking and yawning. Whenever drivers 
feel drowsy, the rate of eye closure frequency raises. If this rate overrides the threshold value, then the 
system must generate an alarm. The multi-task cascaded convolutional networks (MTCNN) [37] are used for 
face detection because it’s one of the most accurate and fast face detectors. The CNN model based on the 
EfficientNet network is used for landmark prediction. The Google mind group has developed the baseline of 
the EfficientNet network; they proposed a more efficient model as suggested by its name. The EfficientNet 
model architecture is unlike the traditional convolutional network design that primarily concentrates on 
selecting the appropriate layer architecture; EfficientNet employs the concept of compound scaling to expand 
the model size (length, width, and image resolution) without modifying the predefined architecture in the 
baseline model to enhance the model accuracy [38] as shown in Figure 3(a) and Figure 3(b). 

In the compound scaling method, a compound coefficient (Ø) is used for scaling network 
dimensions: width (w), depth (d), and resolution (r) in a principled way: 
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the dimensions ( a, $ and y ) are constants that a small grid search can determine. 
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Figure 2. The proposed system block 
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Figure 3. Model scaling (a) is a baseline of the conventional network and (b) is a conventional scaling of 
network dimension 


We can conclude from equation that the coefficient can control the available resources to expand the 
model. In contrast, the depth, width, and resolution can hold how to manage these additional network 
resources. The predicted facial landmarks from the EfficientNet model are used to locate the coordinates of 
facial features. The coordinates shown in Figure 4 are employed to track the eyes and mouth distance ratio. 

We consider the eyes and mouth contours to determine the drowsy based on facial contours (1). The 
eye aspect ratio (EAR) is used to compute the distance between the eyes and from edge to edge based on the 
euclidean distance formula. In the same way, the distance between the mouth lips edges is calculated to 
predict the rate of yawn frequency times. 


||P2—P6||+||P3—P5|| (2) 
2||P1—P4|| 


EAR = 
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Physically, when the driver gets drowsy, the eyelids become closer while the lips getaway. This 
increases the distance between the lips while decreasing the space between the eyelids in a yawning state. 
The frequency of (f) for both eye and yawn are calculated within a specific time to determine the drowsy 
threshold (0). If (f > parent 0) t, the alarm will alert the driver by displaying a frame message. 
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Figure 4. The coordinates of facial landmarks 


4. EXPERIMENTAL WORK AND RESULTS 

The 300 W face dataset is used for facial landmarks prediction and other images as well. The dataset 
considers the diversity of expression, identity, illumination conditions, head poses, and face size. The images 
were semi-automatically annotated with the 68-point mark-up. The model has trained on NVIDIA's 2" gen 
RTX 3060 graphics processing unit (GPU) architecture and with a Tensorflow environment. The images are 
resized to (128x128), and the dataset has been split into 80% for training stage and 20% for testing. The 
model was trained using 100 batch sizes and 100 epochs. Adopmizethree with 107? learning rate and 
categorical crossentropy loss function has been used in the experimental work to predict the facial landmarks 
with the EfficientNet model. The model accuracy is approximately 82%, with a 0.5 dropout rate, as shown in 
Figure 5. The model is robust in detecting facial landmarks, especially the area around the eyes and nose. 
Finally, these landmarks have been mapped to calculate the aspect ratio of the eyes and the mouth, to achieve 
yawn detection and closed eye recognition. To achieve the best result, many videos have been taken to test 
the precision of the drowsiness detection system, using different kinds of cameras with different resolutions. 
Figure 6 shows a video taken of a driver with a closed eye and opened mouth as a test result. 
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Figure 5. EfficientNet model accuracy and loss 


Driver drowsiness monitoring system based on facial Landmark detection with ... (Roaa Albasrawi) 


2642 O ISSN: 2302-9285 


5. 


Figure 6. Drowsy detection results 


CONCLUSION 
This work considered the recognition of eye and mouth status for helping to judge drowsiness. A 


model of ConvNet based on EfficientNet has been proposed to predict the facial landmarks, which are 
mapped to predict the facial key points on the input face in real-time. From the above results, it is clear that 
the proposed model is effective, reliable, and usable. In the future, the proposed model can be implemented 
on Android applications to avoid accidents caused by driver drowsiness. 
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